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“FUTURE TELLING”: A META-ANALYSIS OF 
FORCED-CHOICE PRECOGNITION 
EXPERIMENTS, 1935-1987 


By CHhakRLes LLONORTON AND DIANE C. FERRARI 


ABSTRACT: We report a meta-analysis of forced-choice precognition experiments 
published in the English-language parapsychological literature between 1935 and 
1987. These studies involve attempts by subjects to predict the identity of target 
stimuli selected randomly over intervals ranging from several hundred milli- 
seconds to one year following the subjects’ responses. We retrieved 309 studies 
reported by 62 investigators. Nearly two million individual trials were contributed 
by more than 50,000 subjects. Study outcomes are assessed by overall level of sta- 
tistical significance and effect size. There is a small, but reliable overall effect (z 
= 11.41, p = 6.3 x 10°”). Thirty percent of the studies (by 40 investigators) are 
significant at the 5% significance level. Assessment of vulnerability to selective re- 
porting indicates that a ratio of 46 unreported studies averaging null results would 
be required for each reported study in order to reduce the overall result to nonsig- 
nificance. No systematic relationship was found between study outcomes and eight 
indices of research quality. Effect size has remained essentially constant over the 
survey period, whereas research quality has improved substantially. Four moder- 
ating variables appear to covary significantly with study outcome: Studies using 
subjects selected on the basis of prior testing performance show significantly larger 
effects than studies using unselected subjects. Subjects tested individually by an 
experimenter show significantly larger effects than those tested in groups. Studies 
in which subjects are given trial-by-trial or run-score feedback have significantly 
larger effects than those with delayed or no subject feedback. Studies with brief 
intervals between subjects’ responses and target generation show significantly 
stronger effects than studies involving longer intervals. The combined impact of 
these moderating variables appears to be very strong. Independently significant 
outcomes are observed in seven of the eight studies using selected subjects, who 
were tested individually and received trial-by-trial feedback. 


Precognition refers to the noninferential prediction of future 
events. Anecdotal claims of “future telling” have occurred through- 
out human history in virtually every culture and period. Today such 
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claims are generally believed to be based on factors such as delusion, 
i@ptionality, and superstitious thinking. “The concept of precogni- 
tydn runs counter to accepted notions of causality and appears to 
c@hflict with current scientific theory. Nevertheless, over the past 
ta@f-century a substantial number of experiments have been re- 
yted claiming empirical support for the hypothesis of precogni- 
t@n. Subjects in forced-choice experiments, according to many re- 
iets, have correctly predicted to a statistically significant degree the 
iftntity (or order) of target stimuli randomly selected at a later 
Qe. 
Ove performed a meta-analysis of forced-choice precognition ex- 
sOriments published in the English-language research litcrature be- 
Seen 1935 and 1987. Four major questions were addressed 
(rough this meta-analysis: (1) Is there overall evidence for accurate 
Peet identification (above-chance hitting) in experimental precog- 
@Lion studies? (2) What is the magnitude of the overall precognition 
ect? (3) Is the observed effect related to variations in methodo- 
Bical quality that could allow a more conventional explanation? (4) 
es precognition performance vary systematically with potential 
Rdoderating variables, such as differences in subject: populagions, 
stimulus conditions, experimental setting, knowledge of results, and 


ime interval between subject response and target generation? 
= i. e 


8 


DELINEATING ‘THE DOMAIN 


2000/08 


trieval of Studies 


Parapsychological research is still academically taboo, and it is 
likely that there have been many dissertations and theses in this 
a that have escaped publication. Our retrieval of studies for this 
eta-analysis is therefore based on the published literature. The 
dies include all forced-choice precognition experiments appear- 
rin the peer-reviewed English-language parapsychology journals: 
rnal of Parapsychology, Journal (and Proceedings) of the Society for 
tame Research, Journal of the American Society for Psychical Research, 


ease 


U 


sy | 


ropean Journal of Parapsychology (including the Research Letter of 
Re Utrecht University Parapsychology Laboratory), and abstracts of 


¢€er-reviewed papers presented at Parapsychological Association 
meetings published in Research in Parapsychology. 


Criteria for Inclusion 


Our review is restricted to fixed-length studies in which signifi- 
cance levels and effect sizes based on direct hitting can be calcu- 


A Meta-Analysis of Forced-Choice Precognition Experiments 283 


lated. Studies using outcome variables other than direct hitting, such 


as run-score variance and displacement effects, are included only if 


the report provides relevant information on direct hits (i.e., numf?r 
of trials, hits, and probability of a hit). Finally, we exclude studags 
conducted by two investigators, S. G. Soal and Walter J. Levy, whe 
work has been unreliable. = 
Many published reports contain more than one experimentr 
experimental unit. In experiments, involving multiple conditiges, 


significance levels and effect sizes are calculated for each condition. 
Oo 


Outcome Measures & 
co 


Significance level. Significance levels (z scores) were calculated fpr 
each study from the reported number of trials, hits, and probabs By 
of success using the normal approximation to the binomial diggi- 
bution with continuity correction. Positive z scores indicate abqye- 
chance scoring, and negative z scores reflect below-chance scori 

Effect size, Because most parapsychological experiments, palgc- 
ularly those in the older literature, have used the trial rather thon 
the subject as the sampling unit, we use a tial-based estimatoWot 
effect size. The effect size (ZS) for each study is the z score divided 
by the square root of the number of trials in the study.’ 


0/08/0 


General Characteristics of the Domain 


i=) 

We located 309 studies in 113 separate publications. These sl- 
ies were contributed by 62 different senior authors and were p@b- 
lished over a 53-year period, between 1935 and 1987. Considerf&g 
the half-century time-span over which the precognition experim@&ts 
were conducted, it is not surprising that the studies are very diveMe. 

The database comprises nearly two million individual trials gid 
more than 50,000 subjects. Study sample sizes range “from 250to 
297,060 trials (median = 1,194). The number of subjects rates 
from 1 to 29,706 (median = 16). The studies use a variety of m@h- 
odologies, ranging from guessing ESP cards and other card symigpis 
to automated random number generator experiments. The donfgin 
encompasses diverse subject populations: the most frequently uged 


a § 
' Elsewhere (Honorion, 1985), we have used the effect size index Cohen’s & 


(Cohen, 1977), and one referee has asked that we explain why we are now using 
12 


uN". The answer is that kh and z/N'™ yield virtually identical results, and 2/N™ is. 


computationally simpler. For the present sample of 309 precognition studies, the 

1 difference between the two indices is .00047, and the standard deviation of the 
‘ 5.026: 4308) = 0.312, p = .756, two-tailed. Vhe correlation between the 
07. 


two indices 
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‘TABLE | 
OVERALL SIGNIFICANCE LEVEL AND EFFECT SIZE 


z ES 
$0 he ee Sr 
Mean 0.65 0.020 
SD 2.68 0.100 
Lower 95% confidence estimate 0.40 0.011 


Combined z = 11.41, p = 6.3 x 10° 
“Fail-safe N” = 14,268 
HES) = 3.51, 308 df, p = .00025 


cee ee Se 
population is students (in approximately 40% of the studies); the 
least frequently used populations are the experimenters themselves 
and animals (each used in about 5% of the studies). 

Though a few studies tested subjects through the mail, more typ- 
ically subjects were tested in person, either individually or in groups. 
‘Target selection methods included no randomization at all (studies 
using “quasi-random” naturalistic events), informal methods includ- 
ing manual card-shuffling or dice-throwing, and formal methods, 
primarily random number tables or random number generators. 
The time interval between the subjects’ responses and target gen- 
eration varied from less than one second to one year. 


OVERALL CUMULATION 


Evidence for an overall effect is strong. As shown in the top part 
of Table 1, the overall results are highly significant.” Lower bound 
(one-tailed) 95% confidence estimates of the mean z score and ES 
are displayed in the bottom portion of Table I. 

"Ninety-two studies (30%) show significant hitting at the 5% level, 
and significant outcomes are contributed by 40 different investiga- 
tors. The z scores correlate significantly with sample size: r(307) = 
.156, p = .003. The mean number of trials for significant studies 1s 
34% larger than the mean number of trials for nonsignificant stud- 
ies. 

? The statistical analyses presented here were performed using SYSTAT (Wilk- 
inson, 1988). When / tests are reported on samples with unequal variances, they are 
caloulated using the separate varices within groups for the error and degrees of 
freedom following Brownlee (1965). Unless otherwise specified, p levels are one- 
tailed. Combined 2’s are based on Stouffer’s method (Rosenthal, 1984). 
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Figure 1. Mean effect size by investigator. N = 62 investigators. 


Replication Across Investigators 


Virtually the same picture emerges when the cumulation is 
investigator rather than study as the unit of analysis; the combin 
z is 12.13, and 23 of the 62 investigators (37%) have overall oyy 
comes significant at the 5% level: The mean (investigator)effect siae 
is 0.033 (SD = .093). LL. 

There is a significant difference in the mean ES across invests 
gators, but it is surprisingly small: Kruskal-Wallis one-way ANOVS 
by ranks, x°(61) = 82.71, p = .034. The effect is clearly not due ® 
a few major contributors. If investigators contributing more thay, 
three studies are eliminated, leaving 33 investigators, the combing 
z is still 6.00 (p = 1.25 x 107°) and the mean ES is .028 (SD = 
091). Figure 1 shows the mean effect sizes by investigator. 

‘These results indicate substantial cross-investigator replicability 
and directly contradict the claim of critics such as Akers (1987) that 


efedse 2000/08/08 


286 The Journal of Parapsychology 


successful parapsychological outcomes are achieved by only a few 
invesligalors. 
oO 
! 
Phe Filedrawer Problem 
Oo 


(=) 


S 
aBences favoring publicauion of “significant” studies (e.g., Sterling, 


£859). The extreme view of this “filedrawer problem” is that “the 


Qunals are filled with the 5% of the studies that show Type I er- 


ors, while the filedrawers back at the lab are filled with the 95% of 
& studies that show nonsignificance...” (Rosenthal, 1984, p. 108). 
Recognizing the importance of this problem, the Parapsychological 
Ssociation in 1975 adopted an official policy against selective re- 
rting of positive results.” Examination of the parapsychological lit- 
ture shows that nonsignificant results are frequently published, 
Qad, in the precognition database, 70% of the studies have reported 
BBasignificant results. Nevertheless, 75% of the precognition studies 
st published before 1975, and we must ask to what extent selec- 
e publication bias could account for the cumulative effects we ob- 
co Phe central section of Table 1 uses Rosenthal’s (L084) “fail-safe 
@’ statistic to estimate the number of unreported studies with z 
ores averaging zero that would be necessary to reduce the known 
@plabase to nonsignificance. ‘The filedrawer estimate indicates that 
Ber 46 unreported studies must exist for each reported study to 
Qduce the cumulative outcome to a nonsignificant level. 
® A different approach to the filedrawer problem is described by 
wwes, Landman, and Williams (1984; personal communication 
Bom Dawes to Honorton, July 14, 1988). Their truncated normal 
perve analysis, like Rosenthal’s “fail-safe N,” is based on normal 
qurve assumpuons. ‘Vheir null hypothesis is that z scores above some 
(Gitical level (e.g., z = 1.65, 1.96, etc.) are randomly sampled from 
29,1) above that critical level. The alternative to the null hypothesis 
@ that, because there is some real effect, the distribution of z’s is 
Gifted to the right of 0 and the z’s will be larger than predicted by 
@e null. For a critical level of z = 1.65, the expected mean z is 2.06 
Sad the variance is .14. In the precognition database, there are 92 
Shidies with z’s > 1.65. Their average is 3.61, not 2.06 as predicted 


Of 


a = 106): 1(307 
= 0.28, AND) 
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by the null hypothesis. Since the variance of the normal truncated 
above 1.65 is .14, the ¢est z (using the Central Limit Theorem) com- 
paring 3.61 to 2.06 is 39.84 [1.55 divided by (.14/92)'"]. Here, p iS & 


virtually zero. Similar results are found with cut points of 1.96, yaa 2s Ba 


and 2.58. 4 
On the basis of these analyses, we conclude that the cumulative © 
significance of the precognition studies cannot satisfactorily be ex- & 
plained by selective reporting. = 
N 

N 

oO 

OUTLIER REDUCTION 

mo 

‘ foe) 

Although the overall z scores and effect sizes cannot reasonably FS 


be attributed to chance, inspection of the standard deviations in@ 
Table 1 indicates that the study outcomes are extremely heteroge-co 
neous. Given the diversity of methods, subject populations, anda’ 
other study features that characterize this research domain, this isQ 
not surprising. or 

The study outcomes are in fact extremely heterogeneous. Al-< 
though a major objective of this meta-analysis is to account for theO 
variability across studies by blocking on differences in study quality, 35 
procedural features, and sampling characteristics, the database 
clearly contains extreme outliers. The z scores range from —5.1 too 
19.6, a 25-sigma spread! The standardized index of kurtosis (g2) is 
9.47, suggesting that the tails of the distribution are much too longS 
for a normal distribution. a 

We eliminated the extreme outliers by performing a “10 percent® 
trim” on the study z scores (Barnett & Lewis, 1978). This involves 
eliminating studies with z scores in the upper and lower 10% of. the® 
distribution, and results in an adjusted sample of 248 studies. They 
trimmed z scores range from — 2.24 to 3.21 (g. = — 1.1). The re-w 
vised z scores and effect sizes are presented in Table 2. hd 

Elimination of extreme outliers reduces the combined z scores bygg 
approximately one half, but the outcomes remain highly significant. 
Twenty-five percent of the studies (62/248) show overall significanto 
hitting at the 5% level. Lower bound confidence estimates show thaig, 
the mean z’s and effect sizes are above 0 at the 95% confidence level 

Elimination of outliers reduces the total number of investigators 
from 62 to 57, but the results remain basically the same when the 
analyses are based on investigators rather than studies. ‘The com- 


bined z is 6.84; 18 of the 57 investigators (31.6%) have overall. sig- 
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‘TABLE 2 
oO SIGNIFICANCE LEVEL AND EFFECT StzE FOR TRIMMED SAMPLE 
; 3 

Oo = ES 
e- 

an 0.38 0.012 
Ryo) 1.45 0.065 
@rer 95% confidence estimate 0.23 0.005 
a Combined z = 6.02, p = 1.1 x 107° 

= WES) = 2.90, 247 df, p = -002 

¢ 

o 


icant Outcomes at the 5% level. The mean (investigator) ES is 
20 (SD = .05). 

@ For the trimmed sample, the difference in ES across investiga- 
ss Is nol significant: Kruskal-Wallis one-way ANOVA by ranks, 
56) = 59.34, p = .355. If investigators contributing more than 
ee studies are eliminated, leaving 37 investigators, the combined 

zas sull 5.00 (9 = 3.0 x 10°’) and the mean ES is 0.022 (SD = 

156). Figure 2 shows the mean effect size by investigator. 

O Thus, elimination of the outliers does not substantially affect the 

baa ae drawn from our analysis of the database as a whole. 

Srere clearly is a nonchance effect. In the remainder of this report, 

@ use the trimmed sample to examine covariations in effect size 


@ad a variety of methodological and other study features. 


0 


oO 
oO 
Ps STUDY QUALITY 


Ss 


© Because target stimuli in precognition experiments are selected 
oBly after the subjects’ responses have been registered, precognition 
sedies are usually not vulnerable to sensory leakage problems. 
Gther potential threats to validity must, however, be considered. 
Pe problem of variations in research quality remains a source of 
Geptroversy in meta-analysis. Some meta-analysts advocate eliminat- 
in@ low quality studies whereas others recommend empirically ac- 
ae the impact of variations in quality on study outcome. Rosen- 
ti! (1984) points out that the practice of discarding studies is 
eae to assigning them weights of zero, and he recommends 
weighting study z scores in relation to ratings of research 


e 


Study Quality Criteria 


Ideally, the assessment of study quality should be performed by 
knowledgeable specialists who are blind to the study outcomes. In 
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Figure 2. Mean effect size by investigator for trimmed sample. N = 57 ife 
vestigators. N 


‘ 


® 

practice, this is usually not feasible, particularly when, as in the presh 
ent case, large numbers of studies are involved. For our analysis Ry 
study quality, statistical and methodological variables are defined 
and coded in terms of procedural! descriptions (or their absence) ff 
the research reports. This approach: was used in an earlier metay 
analysis of psi ganzfeld research (Honorton, 1985), and it led & 
study quality ratings that were generally in agreement, 7(26) = 768 
p = 107°, with independent “flaw” ratings by an outside critic (Hp 
man, 1985). = 
One point is given (or withheld) for each of the following eight 
criteria: oo < 
Specification of sample size. Does the investigator preplan the num- 
ber of trials to be included in the study or is the study vulnerable 
to the possibility of optional stopping? Credit is given to reports that 
explicitly specify the sample size. Studies involving group testing, in 
which it is not feasible to specify the sample size precisely, are also 
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given credit. No credit is given to studies in which the sample size 

is either not preplanned or not addressed in the experimental re- 
F port. 
©  ~Preplanned analysis. Is the method of statistical analysis, including 
© the outcome (dependent variable) measure, preplanned? Credit is 
“= given to studies explicitly specifying the form of analysis and the 
© outcome measure. No credit is given to those not explicitly stating 
<d the form of the analysis or those in which the analysis is clearly post 
hoc. 

Randomization method. Credit is given for use of random number 
tables, random number generators, and mechanical shufflers. No 
credit is given for failure to randomize (i.e., use of “quasi-random 
naturalistic events”) or for informal methods such as hand-shuffling, 
die-casting, and drawing lots. 

Conirols. Credit is given to studies reporting randomness control 
checks, such as random number generator (RNG) control series and 
empirical cross-check controls. 

Recording. One point is allotted for automated recording of tar- 
gets and responses, and another for duplicate recording. 

Checking. One point is allotted for automated checking of 
matches between target and response, and another for duplicate 
checking of hits. 


CIA-RDP96-00789R002 


Study Quality Analysis 


Each study received a quality weight between 0 and 8 (mean = 
3.3, SD = 1.8). We find no significant relationship between study 
quality and £S; r(246) = .081, p = .202, two-tailed. ‘Vhis tendency 
for study outcomes to correlate positively with study quality has the 
consequence that the quality-weighted z score of 6.26 1s. slightly 
larger than the unweighted z of 6.02. Table 3 shows the correlations 
between effect size and each of the eight individual quality meas- 
ures.’ The mean effect sizes by quality level are displayed graphi- 
cally in Figure 3. 


pproved For Release 2000/08/08 


* The correlation between ES and siudy quality is also nonsignificant for the un- 
trimmed sample of 309 studies: r(307) = —.060, p = .289. The quality-weighted z 
score is 7.38: p = 2.32 x 10°. However, three of the individual quality measures 
are significantly related to performance. Controls and duplicate checking correlate 
significantly positively with ES, and randomization correlates significantly negatively 
with ES. These correlations appear to be due to a few studies with z scores that are 
extreme outliers (z > 7). When the 10 studies with z > 7 are eliminated, the signifi- 
cant correlations between quality and £S disappear. 


A 
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TABLE 3 
CORRELATIONS BETWEEN EFFECT SIZE AND QUALITY MEASURES 


° 

Quality measure r(246) o 
oO 

Sample size specified in advance — .100 =) 
Preplanned analysis — .001 © 
Randomization - 011 = 
Controls 058 ql 
Automated recording .169 N 
: : oO 
Duplicate recording 047 oO 
Automated checking 136 a 
Duplicate checking .078 4 
| 

: oO 
Quality Extremes ° 
co 

Is there a tendency for extremely weak studies to show largerp. 


effects than exceptionally “good” studies? Analysis on the extremesO 
of the quality ratings indicates that this is not the case. ‘ 

This analysis, based on the untrimmed sample of 309 studies, 
uses studies with quality ratings outside the interquartile range of O 
the rating distribution (median = 4, Q, = 2, Qs; = 5). There are 56 
“low-quality” studies (ratings of 0—1) and 35 “high-quality” studies © 
(ratings of 6~8). The high-quality studies have effect sizes that are 0 


R 


not significantly lower than the low-quality studies; the ES means S 
are 0.017 (SD = 0.063) and 0.037 (SD = 0.137), for the low- and© 
high-quality studies, respectively: 4(82) = —.92, p = .358, two- & 
tailed. “ @ 

“”) 
Quality Variation in Publication Sources o 

o 


Precognition ES is not significantly related to source of publica-& 
tion: Kruskal-Wallis one-way ANOVA, x°(4). = 0.78, p =  .942. rs} 
However, the sources of publication differ significantly in study 
quality: Kruskal-Wallis one-way ANOVA, x°(4) = 17.19, p = .002.3 
This is due largely to the lower quality of studies published in the > 


Journal of the Society for Psychical Research and in Research in Parapsy- © 


chology. a 
< 


e 


Siudy Quality in Relation to Year of Publication 


Precognition effect size has remained constant over a half-cen- 
tury of research, even though the methodological quality of the re- 
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Figfge 3. Precognition effect size in relation to study quality, with 95% con- 
fidange limits. N = 248 studies. 


seasoh has improved significantly during this period. The correla- 
tiorgpbetween ES and year of publication is —.071: (307) = — 1.25, 
p =@.213, two-tailed. Study quality and year of publication are, how- 
evi positively and significantly correlated: r(246) = 282, p = 2 x 
106 two-tailed. 

UGritics of parapsychology have long believed that evidence for 
patgpsychological effects disappears as the methodological rigor in- 
creaes. ‘he precognition database does not support this belief. 


Appro 


“REAL-TIME” ALTERNATIVES TO PRECOGNITION 


Investigators have long been aware of the possibility that precog- 
nition effects could be modeled without assuming either ime rever- 
sal or backward causality. For example, outcomes from studies with 
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targets based on indeterminate random number generators (RNGs) 
could be due to a causal influence on the RNG—a psychokinetic 
(PK) effect—rather than information acquisition concerning its fu- 
ture state. In experiments with targets based on prepared tables of 
random numbers, the possibility exists that the experimenter or 
other randomizer may be the actual psi source, unconsciously using 
“real-time” ESP combined with PK to choose an entry point in the 
random .number sequence that will significantly match the “sub- 
ject’s” responses. While the latter possibility may seem far-fetched, 
it cannot be logically eliminated if one accepts the existing evidence 
for contemporaneous ESP and PK, and it has been argued that it is 
less far-fetched than the alternative of “true” precognition. 

Morris (1982) discusses models of experimental precognition 
based on “real-time” psi alternatives and methods for testing “true” 
precognition. In general terms, these methods constrain the selec- 
tion of the target sequence so as to eliminate nonprecognitive psi 
intervention. In the most common procedure, attributed to Mangan 
(1955), dice are thrown to generate a set of numbers that are math- 
ematically manipulated to obtain an entry point in the random num- 
ber table. This procedure is sufficiently complex “as to be appar- 
ently beyond the capacities of the human brain, thus ruling out PK 
because the ‘PKer’ would not know what to do even via ESP” (Mor- 
ris, 1982, p. 329). 

‘Iwo features of precognition study target determination proce- 


dures were coded to assess “real-time” psi alternatives to precogni-_ 


tion: method of determining random number table entry point and 
use of Mangan’s method. : 
Methods of eliminating “real-time” psi alternatives have not been 
used in studies with random number generators and have only been 
used in a small number of studies involving randomization by hand- 
shuffling. These analyses are therefore restricted to studies using 
random number tables (NV = 138). ” 


Method of Determining RNT Entry Point 


The reports describe six different methods of obtaining entry. 


points in random number tables. If the study outcomes were due to 


subjects’ precognitive functioning rather than to alternative psi‘ 


modes on the part of the experimenter or the experimenter’s as- 
sistants, there should be no difference in mean effect size across the 
various methods used to determine the entry point. Indeed, our 
analysis indicates that the study effect sizes do not vary systemati- 
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cally as a function of method of determining the entry point: Krus- 
kal-Wallis one-way ANOVA by ranks: x°(5) = 7.32, p = .198. 


Use of Mangan’s Method 


We find no significant difference in ES between studies using 
complex calculations of the type introduced by Mangan to fix the 
random number table entry point and those that do not use such 
calculations: i(45) = 0.38, p = .370, two-tailed. 


MODERATING VARIABLES 


The stability of precognition study outcomes over a 50-year pe- 
riod, which we described earlier, is also bad news. It shows that in- 
vestigators in this area have yet to develop sufficient understanding 
of the conditions underlying the occurrence (or detection) of these 
effects to reliably increase their magnitude. We have identified four 
variables that appear to covary systematically with precognition £S: 
(1) selected versus unselected subjects, (2) individual versus group 
testing, (3) feedback level, and (4) time interval between subject re- 
sponse and target generation. 

The analyses usé the raw study z scores and effect sizes; we 
found that this results in uniformly more conservative esumates of 
relationships with moderating variables than when the analyses are 
based on quality-weighted z scores and effect sizes. 


Selected Versus Unselected Subjects 


Our meta-analysis identifies eight subject: populations: unspeci- 
fied subject populations, mixtures of several different populations, 
animals, students, children, “volunteers,” experimenter(s), and se- 
lected subjects. 

Effect size magnitude does not vary significantly across these 
eight subject populations: Kruskal-Wallis one-way ANOVA, x"(7) = 
10.90, p = .143. Effect sizes by subject population are displayed in 
Figure 4. 

However, studies using subjects selected. on the basis of prior 
performance in experiments or pilot tests show significantly larger 
effects than studies using unselected subjects. As shown in ‘lable 4, 
60% of the studies with selected subjects are significant at the 5% 
level. The mean z score for these studies is 1.39 (SD = 1.40). ‘The 
ES is significantly higher for selected-subjects studies than for stud- 
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Figure -+. Precoguition effect size by subject population, with 95% confi-© 
dence limits. N = 248 studies. 


ies with unselected subjects. he ¢ test of the difference in mean ES 
is equivalent to a point-biserial correlation of .198. 
Does this difference result from less stringent controls in studies 
with selected subjects? The answer appears to be “No.” The average 
quality of studies with selected subjects is higher than studies using 


TABLE 4 
SELECTED VERSUS UNSELECTED SUBJECTS 


Approved For Re elease 2000/0 08/08 


Selected Unselected 
N studies 25 223 
Combined z 6.89 4.04 
Studies with p < .05 60% 21% 
Mean ES : 051 .008 
SDy-s 075 .063 


(246) = 3.16, p = .001 
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TABLE 5 
INDIVIDUAL VERSUS GROUP TESTING 


Individual Group 
nr 
N studies 97 105 
Combined z 6.64 1.29 
Studies with p < .05 30% 19% 

Mean ES O21 O04 
SD;s .060 .066 


£(200) = 1.89, p = .03 
a 
unselected subjects: £(27) = 1.51, p = 142, two-tailed. This result 
appears to reflect a general tendency toward increased rigor and 
more detailed reporting in studies with selected subjects. 


Individual Versus Group Testing 


Subjects were tested in groups, individually, or through the mail. 
Studies in which subjects were tested individually by an experimen- 
ter have a significantly larger mean ES than studies involving group 
testing (Table 5). 

The ¢ test of the difference is equivalent to a point-biserial cor- 
relation of .132, favoring individual testing. Of the studies with sub- 
jects tested individually, 30% are significant at the 5% level. 

The methodological quality of studies with subjects tested indi- 
vidually is significantly higher than that of studies involving group 
testing: (137) = 3.08, p = .003, two-tailed. This result is consistent 
with the conjecture that group experiments are frequently con- 
ducted as “targets of opportunity” and may often be carried out 
hastily in an afternoon without the preparation and planning that 
go into a study with individual subjects that may be conducted over 
a period of weeks or months. 

‘Thirty-five studies were conducted through the mail. In these 
studies, subjects completed the task at their leisure and mailed their 
responses to the investigator. These correspondence studies yield 
outcomes similar to those involving individual testing. The com- 
bined z score is 2.66, with a mean ES of 0.018 (SD = .082). Ten 
correspondence studies (25.7%) are significant at the 5% level. 

Eleven studies are unclassifiable with regard to experimental set- 


ting. 
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‘TABLE 6 
FEEDBACK RECEIVED BY SUBJECTS 
De 


Feedback of Results 


None Delayed Run score Trial-by-trial 
ee ot a 
N studies 15 21 21 47 
Combined z — 1.30 2.11 4,74 6.98 
Studies with p < .05 0.0% 19.0% 33.3% 42.6% 
Mean ES —.001 009 023 035 
SDes 028 .036 048 072 


LS 


Feedback 


A significant positive relationship exists between the degree of 
feedback subjects receive about their performance and precognitive 
effect size (Table 6). 

Subject feedback information is available for 104 studies, These 
studies fall into four feedback categories: no feedback, delayed 
feedback (usually notification by mail), run-score feedback, and 
trial-by-trial feedback. We gave these categories numerical values 
between 0 and 3. Precognition effect size correlates .231 with feed- 
back level (102 df, p = .009). Of the 47 studies involving trial-by- 
trial feedback, 20 (42.6%) are significant at the 5% level. None of 
the studies without subject feedback are significant. 

Feedback level correlates positively though not significantly with 
research quality: (102) = .173, p = .082, two-tailed. Inadequate 
randomization is the most plausible source of potential artifacts in 
studies with trial-by-trial feedback. We performed a separate analy- 
sis on the 47 studies in this group. Studies using formal methods of 
randomization do not differ significantly in mean ES from“those 
with informal randomization: i(15) = 0.67, p = .590, two-tailed. 
Similarly, studies reporting randomness control data do not differ 
significantly in ES from those not including randomness controls: 


(42) = 0.79, p = .436, two-tailed. 


Time Interval 


The interval between the subject’s response and target selection. 


ranges from less than one second to one year. Information about 
the time interval is available for 144 studies. This information, how- 
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Figure 5. Effect size by precognition interval, with 95%. confidence limits. 


N = 144 studies. 


ever, is often imprecise. Our analysis of the relationship between 
precognitive ES and time interval is therefore limited to seven broad 
interval categories: milliseconds, seconds, minutes, hours, days, 
weeks, and months. (Effect sizes by precognition interval are dis- 
played in Figure 5.) 

Although it is confounded with degree of feedback, there is a 
significant decline in precognition ES over increasing temporal dis- 
tance: r(142) = ~—.199, p = .017, two-tailed. ‘The largest effects oc- 
cur over the millisecond interval: N = 31 studies, combined z = 
6.03, mean ES = 0.045, SD = .073. The smallest effects occur over 
periods ranging from a month to a year: N = 7, combined z = 
0.53, mean £S = U.001, SD = .049. 

Interestingly, the decline of precognition performance over in- 
creasing temporal distances results entirely from studies using; we- 
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selected subjects: 7(122) = —.235, p = .009, two-tailed. Studies with 
selected subjects show a nonsignificant positive relationship between 
ES and time interval: 7(18) = .077, p = .745, two-tailed. Although 
the difference between these two correlations is not significant (2 = 
1.24), this suggests that the origin of the decline over time may be 
motivational rather than the result of some intrinsic physical bound- 
ary condition. The relationship between precognition ES and feed- 
back also supports this conjecture. Nevertheless, any finding sug- 
gesting potential boundary conditions on the phenomenon should 
be vigorously pursued. 


Influence of Moderating Variables in Combination 


The above analyses examine the impact of each moderating var- 
iable in isolation. In this final set of analyses, we explore their joint 
influence on precognition performance. For this purpose, we iden- 
tify two subgroups of studies. One subgroup is characterized by the 
use of selected subjects tested individually with trial-by-trial feed- 
back. We refer to this as the Optimal group (N = 8 studies). The 
second group is characterized by the use of unselected subjects 
tested in groups with no feedback. We refer to this as the Suboptimal 
group (N = 9 studies). 

‘The Optimal studies are contributed by four independent inves- 


-tigators and the Suboptimal studies are contributed by two of the 


same four investigators. All of the Optimal studies involve short pre- 
cognition time intervals (millisecond interval); the Suboptimal stud- 
ies involve longer intervals (intervals of weeks or months). All of the 
Optimal studies and 5 of the 9 Suboptimal studies use RNG meth- 
odology. The two groups do not differ significantly in average sam- 
ple size. The mean study quality for the Optumal group is signifi- 
cantly higher than that of the Suboptimal studies: Opumal mean = 
6.63, SD = 0.92; Suboptimal mean = 3.44, SD = 0.53; 410) = 
8.63, p = 3.3 x 107°, two-tailed. 

‘The combined impact of the moderating variables appears to be 
quite strong (Table 7). Seven of the 8 Optimal studies (87.5%) are 
independently significant at the 5% level, whereas none of the Sub- 
optimal studies are statistically significant. All four investigators con- 
tributing studies to the Optimal group have significant outcomes.” 


“In the untrimmed sample of 309 studies, there are a total of 17 Optimal studies. 
The mean ES is 0.117 (SD = .154), and the combined z is 15.84. The percentage of 
independently significant studies is virtually the same as it is in the trimmed sample: 
15 of the 17 studies (88.2%) are signifrcant. 
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TABLE 7 
Impact OF MODERATORS IN COMBINATION 


“Optimal” studies “Suboptimal” studies 


N studies 8 9 
Combined z 6.14 —1.29 
Studies with p < .05 87.5% 0.0% 
Mean ES .055 .005 
SDes .045 .035 
115) = 2.61, p = 01 
r = .559 


These results are quite striking and suggest that future studies 
combining these moderators should yield especially reliable effects. 


SUMMARY AND CONCLUSLONS 


Our meta-analysis of forced-choice precognition experiments 
confirms the existence of a small but highly significant precognition 
effect. The effect appears to be replicable; significant outcomes are 
reported by 40 investigators using a variety of methodological par- 
adigms and subject populations. 

The precognition effect is statistically very robust: it remains 
highly significant despite elimination of studies with z scores in the 
upper and lower 10% of the z-score distribution and when a third 
of the remaining investigators—the major contributors of precog- 
nition studies—are eliminated. 

Estimates of the “filedrawer” problem and consideration of para- 
psychological publication practices indicate that the precognition ef- 
fect cannot plausibly be explained on the basis of selective publica- 
tion bias. Analyses of precognition effect sizes in relation to eight 
measures of research quality fail to support the hypothesis that the 
observed effect is driven to any appreciable extent by methodolog- 
ical flaws; indeed, several analyses indicate that methodologically su- 
perior studies yield stronger effects than methodologically weaker 
studies. 

Analyses of parapsychological alternatives to precognition, al- 
though limited to the subset of studies using random number tables, 
provide no support for the hypothesis that the effect results -from 
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‘he operation of contemporaneous ESP and PK at the time of ran- 
domization. 

Although the overall precognition effect size is small, this does 
iot imply that it has no practical consequences. It is, for example, 
of the same order of magnitude as effect sizes leading to the early 
ermination of several major medical research studies. In 1981, the 
National Heart, Lung, and Blood Institute discontinued its study of 
sropranolol because the results were so favorable to the propranolol 
reatment that it would be unethical to continue placebo treatment 
Kolata, 1981); the effect size was 0.04. More recently, The Steering 
Sommittee of the Physicians’ Health Study Research Group (1988), 
n a widely publicized report, terminated its study of the effects of 
ispirin in the prevention of heart attacks for the same reason. The 
ispirin group suffered significantly fewer heart attacks than a pla- 
ebo contro! group; the associated effect size was 0.03. 

‘The most important outcome of the meta-analysis is the identi- 
ication of several moderating variables that appear to covary sys- 
ematically with precognition performance. The largest effects are 
ibserved in studies using subjects selected on the basis of prior test 
serformance, who are tested individually, and who receive trial-by- 
rial feedback. The outcomes of studies combining these factors con- 
rast sharply with the null outcomes associated with the combination 
if group testing, unselected subjects, and no feedback of results. Be- 
ause the two groups of studies were conducted by a subset of the 
ame investigators, it is unlikely that the observed difference in per- 


ormance is due to experimenter effects. Indeed, these outcomes 


inderscore the importance of carefully examining differences in 
ubject populations, test setting, and so forth, before resorting to 
acile “explanations” based on psi-mediated experimenter effects or 
he “elusiveness of psi.” 

The identification of these moderating variables has important 


nplications for our understanding of the phenomena and provides _ 


clear direction for future research. The existence of moderating 
ariables indicates that the precognition effect is not merely an 
nexplained departure from a theoretical chance baseline, but 
ather is an effect that covaries with factors known to influence 
rore familiar aspects of human performance. It should now be pos- 
ble to exploit these moderating factors to increase the magnitude 
nd reliability of precognition effects in new studies. 
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